Analysis of clustered data when the cluster size is informative
نویسندگان
چکیده
Clustered data arise in many scenarios. We may wish to fit a marginal regression model relating outcome measurements to covariates for cluster members. Often the cluster size, the number of members, varies. Informative cluster size (ICS) has been defined to arise when the outcome depends on the cluster size conditional on covariates. If the clusters are considered complete then the population of all cluster members and the population of typical cluster members have been proposed as suitable targets for inference, which will differ between these populations under ICS. However if the variation in cluster size arises from missing data then the clusters are considered incomplete and we seek inference for the population of all members of all complete clusters. We define informative covariate structure to arise when for a particular member the outcome is related to the covariates for other members in the cluster, conditional on the covariates for that member and the cluster size. In this case the proposed populations for inference may be inappropriate and, just as under ICS, standard estimation methods are unsuitable. We propose two further populations and weighted independence estimating equations (WIEE) for estimation. An adaptation of GEE was proposed to provide inference for the population of typical cluster members and increase efficiency, relative to WIEE, by incorporating the intra-cluster correlation. We propose an alternative adaptation which can provide superior efficiency. For each adaptation we explain how bias can arise. This bias was not clearly described when the first adaptation was originally proposed. Several authors have vaguely related ICS to the violation of the ‘missing completely at random’ assumption. We investigate which missing data mechanisms can cause ICS, which might lead to similar inference for the populations of typical cluster members and all members of all complete clusters, and we discuss implications for estimation.
منابع مشابه
Cluster adjusted regression for displaced subject data (CARDS): Marginal inference under potentially informative temporal cluster size profiles.
Ignorance of the mechanisms responsible for the availability of information presents an unusual problem for analysts. It is often the case that the availability of information is dependent on the outcome. In the analysis of cluster data we say that a condition for informative cluster size (ICS) exists when the inference drawn from analysis of hypothetical balanced data varies from that of infer...
متن کاملMarginal association measures for clustered data.
The use of correlation coefficients in measuring the association between two continuous variables is common, but regular methods of calculating correlations have not been extended to the clustered data framework. For clustered data in which observations within a cluster may be correlated, regular inferential procedures for calculating marginal association between two variables can be biased. Th...
متن کاملReview of methods for handling confounding by cluster and informative cluster size in clustered data
Clustered data are common in medical research. Typically, one is interested in a regression model for the association between an outcome and covariates. Two complications that can arise when analysing clustered data are informative cluster size (ICS) and confounding by cluster (CBC). ICS and CBC mean that the outcome of a member given its covariates is associated with, respectively, the number ...
متن کاملEfficient Estimation Methods for Informative Cluster Size Data
Based on clustered data with informative cluster size, two efficient estimation methods are proposed for marginal models. In our procedures, the information of within-cluster correlation and minimum cluster size is fully used; this is not the case with the within-cluster re-sampling (WCR) and cluster-weighted generalized estimating equation (CWGEE) methods. When the correlation model is valid a...
متن کاملRegression analysis of clustered interval-censored data with informative cluster size.
Interval-censored data are commonly found in studies of diseases that progress without symptoms, which require clinical evaluation for detection. Several techniques have been suggested with independent assumption. However, the assumption will not be valid if observations come from clusters. Furthermore, when the cluster size relates to response variables, commonly used methods can bring biased ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012